Goto

Collaborating Authors

 classification module



EEGDM: EEG Representation Learning via Generative Diffusion Model

Puah, Jia Hong, Goh, Sim Kuan, Zhang, Ziwei, Ye, Zixuan, Chan, Chow Khuen, Lim, Kheng Seang, Fong, Si Lei, Woon, Kok Sin, Guan, Cuntai

arXiv.org Artificial Intelligence

While electroencephalogram (EEG) has been a crucial tool for monitoring the brain and diagnosing neurological disorders (e.g., epilepsy), learning meaningful representations from raw EEG signals remains challenging due to limited annotations and high signal variability. Recently, EEG foundation models (FMs) have shown promising potential by adopting transformer architectures and self-supervised pre-training methods from large language models (e.g., masked prediction) to learn representations from diverse EEG data, followed by fine-tuning on specific EEG tasks. Nonetheless, these large models often incurred high computational costs during both training and inference, with only marginal performance improvements as the model size increases. In this work, we proposed an EEG representation learning framework building upon Generative Diffusion Model (EEGDM). Specifically, we developed a structured state-space model for diffusion pretraining (SSMDP) to better capture the temporal dynamics of EEG signals and trained it using Denoising Diffusion Probabilistic Model (DDPM) framework. Subsequently, the resulting latent EEG representations were then used for downstream classification tasks via our proposed latent fusion transformer (LFT). To evaluate our method, we used multi-event datasets covering both interictal epileptiform discharges (TUEV) and seizure (CHB-MIT) detection, and compared EEGDM with current state-of-the-art approaches, including EEG FMs. Empirical results showed that our method outperformed the existing methods. These findings suggested that EEGDM offered a promising alternative to current FMs. Our source code and checkpoint are available at: https://github.com/jhpuah/EEGDM.



BERT-based model for Vietnamese Fact Verification Dataset

Tran, Bao, Khanh, T. N., Tuong, Khang Nguyen, Dang, Thien, Nguyen, Quang, Thinh, Nguyen T., Hung, Vo T.

arXiv.org Artificial Intelligence

The rapid advancement of information and communication technology has facilitated easier access to information. However, this progress has also necessitated more stringent verification measures to ensure the accuracy of information, particularly within the context of Vietnam. This paper introduces an approach to address the challenges of Fact Verification using the Vietnamese dataset by integrating both sentence selection and classification modules into a unified network architecture. The proposed approach leverages the power of large language models by utilizing pre-trained PhoBERT and XLM-RoBERTa as the backbone of the network. The proposed model was trained on a Vietnamese dataset, named ISE-DSC01, and demonstrated superior performance compared to the baseline model across all three metrics. Notably, we achieved a Strict Accuracy level of 75.11\%, indicating a remarkable 28.83\% improvement over the baseline model.


CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models

Hörst, Fabian, Rempe, Moritz, Becker, Helmut, Heine, Lukas, Keyl, Julius, Kleesiek, Jens

arXiv.org Artificial Intelligence

Digital Pathology is a cornerstone in the diagnosis and treatment of diseases. A key task in this field is the identification and segmentation of cells in hematoxylin and eosin-stained images. Existing methods for cell segmentation often require extensive annotated datasets for training and are limited to a predefined cell classification scheme. To overcome these limitations, we propose $\text{CellViT}^{{\scriptscriptstyle ++}}$, a framework for generalized cell segmentation in digital pathology. $\text{CellViT}^{{\scriptscriptstyle ++}}$ utilizes Vision Transformers with foundation models as encoders to compute deep cell features and segmentation masks simultaneously. To adapt to unseen cell types, we rely on a computationally efficient approach. It requires minimal data for training and leads to a drastically reduced carbon footprint. We demonstrate excellent performance on seven different datasets, covering a broad spectrum of cell types, organs, and clinical settings. The framework achieves remarkable zero-shot segmentation and data-efficient cell-type classification. Furthermore, we show that $\text{CellViT}^{{\scriptscriptstyle ++}}$ can leverage immunofluorescence stainings to generate training datasets without the need for pathologist annotations. The automated dataset generation approach surpasses the performance of networks trained on manually labeled data, demonstrating its effectiveness in creating high-quality training datasets without expert annotations. To advance digital pathology, $\text{CellViT}^{{\scriptscriptstyle ++}}$ is available as an open-source framework featuring a user-friendly, web-based interface for visualization and annotation. The code is available under https://github.com/TIO-IKIM/CellViT-plus-plus.


Motor Imagery Classification for Asynchronous EEG-Based Brain-Computer Interfaces

Wu, Huanyu, Li, Siyang, Wu, Dongrui

arXiv.org Artificial Intelligence

Motor imagery (MI) based brain-computer interfaces (BCIs) enable the direct control of external devices through the imagined movements of various body parts. Unlike previous systems that used fixed-length EEG trials for MI decoding, asynchronous BCIs aim to detect the user's MI without explicit triggers. They are challenging to implement, because the algorithm needs to first distinguish between resting-states and MI trials, and then classify the MI trials into the correct task, all without any triggers. This paper proposes a sliding window prescreening and classification (SWPC) approach for MI-based asynchronous BCIs, which consists of two modules: a prescreening module to screen MI trials out of the resting-state, and a classification module for MI classification. Both modules are trained with supervised learning followed by self-supervised learning, which refines the feature extractors. Within-subject and cross-subject asynchronous MI classifications on four different EEG datasets validated the effectiveness of SWPC, i.e., it always achieved the highest average classification accuracy, and outperformed the best state-of-the-art baseline on each dataset by about 2%.


E2E-AFG: An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation

Jiang, Yun, Xie, Zilong, Zhang, Wei, Fang, Yun, Pan, Shuai

arXiv.org Artificial Intelligence

Retrieval-augmented generation methods often neglect the quality of content retrieved from external knowledge bases, resulting in irrelevant information or potential misinformation that negatively affects the generation results of large language models. In this paper, we propose an end-to-end model with adaptive filtering for retrieval-augmented generation (E2E-AFG), which integrates answer existence judgment and text generation into a single end-to-end framework. This enables the model to focus more effectively on relevant content while reducing the influence of irrelevant information and generating accurate answers. We evaluate E2E-AFG on six representative knowledge-intensive language datasets, and the results show that it consistently outperforms baseline models across all tasks, demonstrating the effectiveness and robustness of the proposed approach.


Multimodal Learning and Reasoning for Visual Question Answering

Ilija Ilievski, Jiashi Feng

Neural Information Processing Systems

Reasoning about entities and their relationships from multimodal data is a key goal of Artificial General Intelligence. The visual question answering (VQA) problem is an excellent way to test such reasoning capabilities of an AI model and its multimodal representation learning. However, the current VQA models are oversimplified deep neural networks, comprised of a long short-term memory (LSTM) unit for question comprehension and a convolutional neural network (CNN) for learning single image representation. We argue that the single visual representation contains a limited and general information about the image contents and thus limits the model reasoning capabilities. In this work we introduce a modular neural network model that learns a multimodal and multifaceted representation of the image and the question. The proposed model learns to use the multimodal representation to reason about the image entities and achieves a new state-of-the-art performance on both VQA benchmark datasets, VQA v1.0 and v2.0, by a wide margin.


ECG Arrhythmia Detection Using Disease-specific Attention-based Deep Learning Model

Jin, Linpeng

arXiv.org Artificial Intelligence

The electrocardiogram (ECG) is one of the most commonly-used tools to diagnose cardiovascular disease in clinical practice. Although deep learning models have achieved very impressive success in the field of automatic ECG analysis, they often lack model interpretability that is significantly important in the healthcare applications. To this end, many schemes such as general-purpose attention mechanism, Grad-CAM technique and ECG knowledge graph were proposed to be integrated with deep learning models. However, they either result in decreased classification performance or do not consist with the one in cardiologists' mind when interpreting ECG. In this study, we propose a novel disease-specific attention-based deep learning model (DANet) for arrhythmia detection from short ECG recordings. The novel idea is to introduce a soft-coding or hard-coding waveform enhanced module into existing deep neural networks, which amends original ECG signals with the guidance of the rule for diagnosis of a given disease type before being fed into the classification module. For the soft-coding DANet, we also develop a learning framework combining self-supervised pre-training with two-stage supervised training. To verify the effectiveness of our proposed DANet, we applied it to the problem of atrial premature contraction detection and the experimental results shows that it demonstrates superior performance compared to the benchmark model. Moreover, it also provides the waveform regions that deserve special attention in the model's decision-making process, allowing it to be a medical diagnostic assistant for physicians.


A Multi-module Robust Method for Transient Stability Assessment against False Label Injection Cyberattacks

Wang, Hanxuan, Lu, Na, Liu, Yinhong, Wang, Zhuqing, Wang, Zixuan

arXiv.org Artificial Intelligence

The success of deep learning in transient stability assessment (TSA) heavily relies on high-quality training data. However, the label information in TSA datasets is vulnerable to contamination through false label injection (FLI) cyberattacks, resulting in degraded performance of deep TSA models. To address this challenge, a Multi-Module Robust TSA method (MMR) is proposed to rectify the supervised training process misguided by FLI in an unsupervised manner. In MMR, a supervised classification module and an unsupervised clustering module are alternatively trained to improve the clustering friendliness of representation leaning, thereby achieving accurate clustering assignments. Leveraging the clustering assignments, we construct a training label corrector to rectify the injected false labels and progressively enhance robustness and resilience against FLI. However, there is still a gap on accuracy and convergence speed between MMR and FLI-free deep TSA models. To narrow this gap, we further propose a human-in-the-loop training strategy, named MMR-HIL. In MMR-HIL, potential false samples can be detected by modeling the training loss with a Gaussian distribution. From these samples, the most likely false samples and most ambiguous samples are re-labeled by a TSA experts guided bi-directional annotator and then subjected to penalized optimization, aimed at improving accuracy and convergence speed. Extensive experiments indicate that MMR and MMR-HIL both exhibit powerful robustness against FLI in TSA performance. Moreover, the contaminated labels can also be effectively corrected, demonstrating superior resilience of the proposed methods.